## CastLab

# **Convolution Accelerator Design Project**

## 1. Objective

- Implement an accelerator that can perform convolution computation in CNN

### 2. Required Specification

- Input feature map size: 128 x 128 x 3 (Width x Height x Channel)
- Kernel size: 3 x 3 x 3 x 3 (Width x Height x Input Channel x Output Channel)
- Output feature map size: 128 x 128 x 3 (Width x Height x Channel)
- Input feature map bit-width: 16-bit (Sign=1, Integer=7, Fraction=8)
- Kernel bit-width: 8-bit (Sign=1, Integer=1, Fraction=6)
- Output feature map bit-width: 16-bit (Sign=1, Integer=7, Fraction=8)
- Stride: 1
- Zero padding: True



### 3. Overall Design

- Weight-stationary systolic array architecture



### 4. Timing Diagram

- Files Initialize -> Weights Prefetch -> Convolution -> Weights Prefetch -> Convolution -> ...





#### 5. Expected Results

- Execution time is almost 334055 ns to run 2 iterations with 100Mhz operating clock
- If accelerator has no bugs, "Successfully Completed" message is displayed



```
File Open Successful: ../../sources/dataset/output_feature0.bmp
bmp_width: 128, bmp_height: 128, bmp_bitdepth: 8
# File Open Successful: ../../sources/dataset/output_featurel.bmp
# bmp_width: 128, bmp_height: 128, bmp_bitdepth: 8
# File Open Successful: ././sources/dataset/output_feature2.bmp
# bmp_width: 128, bmp_height: 128, bmp_bitdepth: 8
# Error Check Start
# Error Check Start
# Error Check Start
# File Open Successful: ../../sources/dataset/input_feature.bmp
# bmp width: 128, bmp_height: 128, bmp_bitdepth: 24
# File Open Successful: ../../sources/dataset/kernel0.bmp
# bmp width: 3, bmp_height: 3, bmp_bitdepth: 24
# File Open Successful: ../../sources/dataset/kernel1.bmp
# bmp_width: 3, bmp_height: 3, bmp_bitdepth: 24
# File Open Successful: ../../sources/dataset/kernel2.bmp
   bmp_width: 3, bmp_height: 3, bmp_bitdepth: 24
 Prefetch Start
   Prefetch Start
# Prefetch Start
# Input Feature Start
# Error Count Result: 0
# Successfully Completed
# Error Count Result: 0
# Successfully Completed
 # Error Count Result: 0
# Successfully Completed
# Error Check Start
# Error Check Start
# Error Check Start
# Prefetch Start
# Prefetch Start
# Prefetch Start
  Input Feature Start
# Error Count Result: 0
   Successfully Completed
Error Count Result: 0
   Successfully Completed
   Error Count Result: 0
   Successfully Completed

*** Note: $stop : H:/Intel_Project/conv/sources/testbench/tb_conv_top.sv(273)
         Time: 334085 ns Iteration: 0
```

#### 6. References

- SCALE-Sim: Systolic CNN Accelerator Simulator (https://arxiv.org/abs/1811.02883)
- CompAct: On-chip Compression of Activations for Low Power Systolic Array Based CNN Acceleration (https://dl.acm.org/doi/fullHtml/10.1145/3358178)
- ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Neural Network Accelerators (https://arxiv.org/abs/1802.03806)